Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Optimize pmovmaskb with a named vector constant #3497

Merged
merged 2 commits into from
Mar 18, 2024

Conversation

Sonicadvance1
Copy link
Member

I was looking at some other JIT overheads and this cropped up as some
overhead. Instead of materializing a constant using mov+movk+movk+movk,
load it from the named vector constant array.

In a micro-benchmark this improved performance by 34%.
In bytemark this improved on subbench by 0.82%

I was looking at some other JIT overheads and this cropped up as some
overhead. Instead of materializing a constant using mov+movk+movk+movk,
load it from the named vector constant array.

In a micro-benchmark this improved performance by 34%.
In bytemark this improved on subbench by 0.82%
Copy link
Collaborator

@alyssarosenzweig alyssarosenzweig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. It feels like we "ought" to be able to generate the constant with a bytewise vectorized (1 << lane ID) but I can't figure out how to actually do that in neon any faster than what we had here so... approved given the positive benchmark data :)

@JosJuice
Copy link

I can't think of a good way to do that in NEON either. SVE does have the INDEX instruction, but even with that, generating the 64-bit constant would take three instructions. So it doesn't seem like it would be better than this PR's approach.

@Sonicadvance1 Sonicadvance1 merged commit ab8ee64 into FEX-Emu:main Mar 18, 2024
10 checks passed
@Sonicadvance1 Sonicadvance1 deleted the movmaskb_constant branch March 18, 2024 23:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants